Add per-operation upstream override for REST API operations#2012
Add per-operation upstream override for REST API operations#2012mehara-rothila wants to merge 18 commits into
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds per-operation upstream overrides for REST operations: schema and generated models (ref-only per-op main/sandbox), validator checks (pattern, max length, existence, and timeout parsing), shared helpers (upstreamref lookup and timeout parsing, clusterkey API-level naming), transformer and XDS wiring to reuse upstream-definition clusters with basePath/timeout inheritance, RDC/xDS cluster naming stability, and extensive unit, integration, and BDD tests. Sequence Diagram(s)sequenceDiagram
participant Client
participant ManagementAPI
participant Transformer
participant XdsTranslator
participant UpstreamDefinitions
participant Envoy
Client->>ManagementAPI: deploy RestApi (may include per-op upstream.ref)
ManagementAPI->>Transformer: StoredConfig -> RuntimeDeployConfig (include per-op upstream refs)
Transformer->>UpstreamDefinitions: resolve per-op ref via upstreamref.FindByName
Transformer->>XdsTranslator: emit routes referencing definition cluster keys
XdsTranslator->>Envoy: create/register clusters (API-scoped clusterkey.APILevel / reused definition clusters)
Envoy-->>Client: traffic routed to correct backend (per-op or API-level)
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@gateway/gateway-controller/api/management-openapi.yaml`:
- Around line 4139-4148: OperationUpstream currently allows an empty object
which is invalid; update the OperationUpstream schema (the OperationUpstream
object that defines properties main and sandbox) to require at least one of
those properties by adding an anyOf clause such as anyOf: - required: [main] -
required: [sandbox] (or an equivalent oneOf/anyOf expression) so the schema
enforces presence of main or sandbox and prevents {} being valid.
In `@gateway/gateway-controller/pkg/policy/builder.go`:
- Around line 185-186: The current condition treats any non-nil
op.Upstream.Sandbox as active; change it to perform a content-based check like
apiSandboxHasContent so empty sandbox objects are ignored. Introduce/compute a
per-op predicate (e.g., perOpSandboxHasContent) that mirrors the api-level
sandbox content test and use if apiSandboxHasContent || (op.Upstream != nil &&
perOpSandboxHasContent) to decide whether to append effectiveSandboxVHost.
In `@gateway/gateway-controller/pkg/xds/translator_test.go`:
- Around line 2061-2074: Update TestResolvePerOpUpstream_DedupSameURL to ensure
URL-stability by creating two distinct api.Upstream instances with different Url
values (e.g., "http://shared-svc:8080" and "http://shared-svc:8081") but the
same apiID, method, path and env, then call translator.resolvePerOpUpstream for
both and assert the returned cluster names are equal; keep the test name and use
the Translator.resolvePerOpUpstream and translator variable references so the
test still targets the same logic and verifies that cluster key derivation does
not depend on the exact URL string.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 4012d4ee-58df-4773-b82e-fa11e65290d8
📒 Files selected for processing (16)
gateway/gateway-controller/api/management-openapi.yamlgateway/gateway-controller/pkg/api/management/generated.gogateway/gateway-controller/pkg/config/api_validator.gogateway/gateway-controller/pkg/config/validator_test.gogateway/gateway-controller/pkg/models/runtime_deploy_config.gogateway/gateway-controller/pkg/policy/builder.gogateway/gateway-controller/pkg/policy/builder_test.gogateway/gateway-controller/pkg/transform/restapi.gogateway/gateway-controller/pkg/transform/restapi_test.gogateway/gateway-controller/pkg/utils/commonutils.gogateway/gateway-controller/pkg/utils/commonutils_test.gogateway/gateway-controller/pkg/xds/translator.gogateway/gateway-controller/pkg/xds/translator_test.gogateway/it/features/per-op-upstream-basic.featuregateway/it/features/per-op-upstream-ref.featuregateway/it/features/per-op-upstream-validation.feature
- OperationUpstream schema now requires at least one of main or sandbox
via anyOf, rejecting empty {} at the schema layer as defense in depth
alongside the validator.
- Policy builder per-op sandbox check is now content-based (trims and
checks Url/Ref non-empty) matching the apiSandboxHasContent pattern
used elsewhere. Fixes asymmetric pointer-only check that would emit
sandbox policy chains for empty sandbox wrappers.
- TestResolvePerOpUpstream_DedupSameURL now uses two distinct URLs with
identical apiID|METHOD|path|env, asserting cluster name independence
from URL. Pins the EDS-stable design contract.
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
🧹 Nitpick comments (1)
gateway/gateway-controller/pkg/policyxds/policyxds_test.go (1)
120-120: 💤 Low valueConsider using a hash-like cluster key suffix for test realism.
The cluster key
"main_fixture"has the correctmain_prefix for an API-level main environment cluster, but the suffixfixturedoesn't reflect the production naming pattern introduced in this PR. According to the PR objectives, API-level clusters follow the pattern<env>_<sha256(apiID|env)[:8]>(e.g.,main_a1b2c3d4). While the current fixture name is functionally correct and clear, using a hash-like suffix such as"main_12ab34cd"would make the test more representative of production behavior.📝 Proposed update for more realistic fixture naming
Upstream: models.RouteUpstream{ - ClusterKey: "main_fixture", + ClusterKey: "main_12ab34cd", },UpstreamClusters: map[string]*models.UpstreamCluster{ - "main_fixture": { + "main_12ab34cd": { BasePath: "/",Also applies to: 132-132
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@gateway/gateway-controller/pkg/policyxds/policyxds_test.go` at line 120, Update the test fixture cluster key strings that currently use "main_fixture" to a realistic hash-like suffix such as "main_12ab34cd" to match the production pattern; locate the occurrences in the policyxds_test setup where ClusterKey: "main_fixture" is declared (the test fixture variables/constants used in Policy XDS tests) and replace them with "main_12ab34cd" (or another 8-char hex-like hash) to make the test names reflect the <env>_<sha256(... )[:8]> convention.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@gateway/gateway-controller/pkg/policyxds/policyxds_test.go`:
- Line 120: Update the test fixture cluster key strings that currently use
"main_fixture" to a realistic hash-like suffix such as "main_12ab34cd" to match
the production pattern; locate the occurrences in the policyxds_test setup where
ClusterKey: "main_fixture" is declared (the test fixture variables/constants
used in Policy XDS tests) and replace them with "main_12ab34cd" (or another
8-char hex-like hash) to make the test names reflect the <env>_<sha256(...
)[:8]> convention.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 8729c50c-d6aa-40f4-b0b0-efec0f9c903c
📒 Files selected for processing (13)
gateway/gateway-controller/cmd/controller/main.gogateway/gateway-controller/pkg/config/api_validator.gogateway/gateway-controller/pkg/policyxds/policyxds_test.gogateway/gateway-controller/pkg/transform/restapi.gogateway/gateway-controller/pkg/transform/restapi_test.gogateway/gateway-controller/pkg/utils/clusterkey/clusterkey.gogateway/gateway-controller/pkg/utils/clusterkey/clusterkey_test.gogateway/gateway-controller/pkg/utils/commonutils_test.gogateway/gateway-controller/pkg/xds/translator.gogateway/gateway-controller/pkg/xds/translator_test.gogateway/it/features/api-level-eds-stable.featuregateway/it/features/per-op-upstream-basic.featuregateway/it/suite_test.go
💤 Files with no reviewable changes (1)
- gateway/gateway-controller/pkg/utils/commonutils_test.go
✅ Files skipped from review due to trivial changes (2)
- gateway/gateway-controller/cmd/controller/main.go
- gateway/it/suite_test.go
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
gateway/gateway-controller/pkg/config/validator_test.go (1)
773-796: ⚡ Quick winIsolate invalid-timeout cases with subtests.
Using
requireinside the loop means the first failing case can stop execution before the next case is checked. Wrapping eachbadTimeoutint.Run(...)keeps both cases independently validated and easier to diagnose.Proposed test refactor
- for _, badTimeout := range []string{"0s", "-5s"} { + for _, badTimeout := range []string{"0s", "-5s"} { + t.Run(badTimeout, func(t *testing.T) { connect := badTimeout definitions := &[]api.UpstreamDefinition{ { Name: "my-upstream", Timeout: &api.UpstreamTimeout{ Connect: &connect, }, Upstreams: []struct { Url string `json:"url" yaml:"url"` Weight *int `json:"weight,omitempty" yaml:"weight,omitempty"` }{ { Url: "http://backend:8080", }, }, }, } errors := validator.validateUpstreamDefinitions(definitions) require.Len(t, errors, 1, "timeout %q must be rejected", badTimeout) assert.Equal(t, "spec.upstreamDefinitions[0].timeout.connect", errors[0].Field) assert.Contains(t, errors[0].Message, "must be a positive duration") + }) }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@gateway/gateway-controller/pkg/config/validator_test.go` around lines 773 - 796, The test loop in validator_test.go uses require inside a for-range which can abort the whole loop on first failure; refactor by wrapping each badTimeout iteration in a subtest using t.Run(fmt.Sprintf("timeout=%s", badTimeout), func(t *testing.T) { ... }) and move the assertions (require.Len, assert.Equal, assert.Contains) into that subtest, capturing the loop variable (e.g., tt := badTimeout) before using it to build the definitions passed to validator.validateUpstreamDefinitions so each case runs and reports independently.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@gateway/gateway-controller/api/management-openapi.yaml`:
- Around line 4136-4137: The schemas that currently use a $ref with a sibling
description (referencing "`#/components/schemas/RestAPIOperationUpstream`") must
be changed so the description is not a sibling of $ref; instead wrap the
reference in an allOf and move the description onto the enclosing schema object.
Concretely, replace occurrences where a property has "$ref:
'`#/components/schemas/RestAPIOperationUpstream`'" alongside "description" by
creating an object with "description: <same text>" and "allOf: [{ $ref:
'`#/components/schemas/RestAPIOperationUpstream`' }]" (apply this change for every
place referencing RestAPIOperationUpstream in the file).
In `@gateway/gateway-controller/pkg/config/api_validator.go`:
- Around line 656-667: The per-operation upstream ref checks (using
upstreamRefRegex and length >100 in the validate path that returns
ValidationError for field/refName) are stricter than names allowed by
validateUpstreamDefinitions, causing valid definition names to become
unreferencable; to fix, apply the same validation rules to
spec.upstreamDefinitions[*].name inside validateUpstreamDefinitions (reuse
upstreamRefRegex and the 100-character limit and return a ValidationError with
the same message format when a definition name violates them) OR remove the
extra constraints from the per-operation check so both places use the same
contract; update validateUpstreamDefinitions to reference the same
upstreamRefRegex and error messages (Field and Message) so names and refs are
aligned.
---
Nitpick comments:
In `@gateway/gateway-controller/pkg/config/validator_test.go`:
- Around line 773-796: The test loop in validator_test.go uses require inside a
for-range which can abort the whole loop on first failure; refactor by wrapping
each badTimeout iteration in a subtest using t.Run(fmt.Sprintf("timeout=%s",
badTimeout), func(t *testing.T) { ... }) and move the assertions (require.Len,
assert.Equal, assert.Contains) into that subtest, capturing the loop variable
(e.g., tt := badTimeout) before using it to build the definitions passed to
validator.validateUpstreamDefinitions so each case runs and reports
independently.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: a0b21e0b-ca37-459a-b596-04dce4de1349
📒 Files selected for processing (18)
gateway/gateway-controller/api/management-openapi.yamlgateway/gateway-controller/pkg/api/management/generated.gogateway/gateway-controller/pkg/config/api_validator.gogateway/gateway-controller/pkg/config/validator_test.gogateway/gateway-controller/pkg/policy/builder.gogateway/gateway-controller/pkg/policy/builder_test.gogateway/gateway-controller/pkg/transform/restapi.gogateway/gateway-controller/pkg/transform/restapi_test.gogateway/gateway-controller/pkg/utils/upstreamref/upstreamref.gogateway/gateway-controller/pkg/utils/upstreamref/upstreamref_test.gogateway/gateway-controller/pkg/xds/translator.gogateway/gateway-controller/pkg/xds/translator_test.gogateway/gateway-controller/tests/integration/storage_test.gogateway/gateway-runtime/policy-engine/internal/kernel/translator.gogateway/gateway-runtime/policy-engine/internal/kernel/translator_test.gogateway/it/features/per-op-upstream-basic.featuregateway/it/features/per-op-upstream-ref.featuregateway/it/features/per-op-upstream-validation.feature
🚧 Files skipped from review as they are similar to previous changes (4)
- gateway/gateway-controller/pkg/policy/builder.go
- gateway/gateway-controller/pkg/policy/builder_test.go
- gateway/gateway-controller/pkg/transform/restapi.go
- gateway/gateway-controller/pkg/xds/translator.go
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
4cfd850 to
4caca38
Compare
|
@coderabbitai review |
✅ Action performedReview finished.
|
Describe the current wiring (only policy xDS receives the transformer registry; main Envoy xDS translates RestAPI configs directly) instead of implying the xDS translator cannot use transformers.
79cff8b to
bc4fb65
Compare
…op sandbox rejection
|
@coderabbitai I'd like your opinion on a design decision in the operation-level upstream schema. Context (gateway-controller/api/management-openapi.yaml):
Question: is restricting per-operation overrides to ref-only (no inline Please weigh in on these specific angles by looking at the actual code:
Please confirm whether the ref-only restriction is sound, or flag concerns. |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: #!/bin/bash
# Trace cluster building in xDS translator for per-op refs vs URL-based upstreams
rg -n "resolvePerOpDefinitionCluster\|resolveUpstreamCluster\|sanitizeEnvoyClusterName\|UpstreamDefinitionClusterPrefix\|perOpDefinitionClusterKey" --type go -A 5 -B 2
echo "---CLUSTERKEY---"
cat gateway/gateway-controller/pkg/utils/clusterkey/clusterkey.go 2>/dev/null || fd -i "clusterkey.go" --exec cat {}
echo "---UPSTREAMREF---"
cat gateway/gateway-controller/pkg/utils/upstreamref/upstreamref.go 2>/dev/null || fd -i "upstreamref.go" --exec cat {}
echo "---XDS_TRANSLATOR_CLUSTER_BUILD---"
rg -n "createCluster\|ConnectTimeout\|UpstreamDefinition\|basePath\|ClusterKey\|EnvoyClusterName" gateway/gateway-controller/pkg/xds/translator.go -A 3 -B 2
echo "---OPERATION_UPSTREAM_SCHEMA---"
rg -n "RestAPIOperationUpstream\|RestAPIOperationUpstreamTarget\|ref-only\|inline" gateway/gateway-controller/api/management-openapi.yaml -A 8 -B 2
echo "---UPSTREAM_DEF_FIELDS---"
rg -n "basePath\|BasePath\|timeout\|Timeout\|weighted\|Weighted\|upstreams\|Upstreams" gateway/gateway-controller/pkg/api/management/generated.go -B 2 -A 2 | head -120Length of output: 5710 🏁 Script executed: #!/bin/bash
# Find actual file paths
fd -i "translator.go" --type f | grep -v test
fd -i "management-openapi.yaml" --type f
fd -i "generated.go" --type f | grep management
echo "---SEARCH FOR KEY SYMBOLS---"
rg -rn "resolvePerOpDefinitionCluster\|UpstreamDefinitionClusterPrefix\|perOpDefinitionClusterKey" --type go -A 8 -B 2
echo "---UPSTREAM DEF STRUCT---"
rg -rn "type UpstreamDefinition\|BasePath\|basePath\|Weighted\|weighted" --type go -A 3 -B 1 | head -80
echo "---OPERATION UPSTREAM OPENAPI---"
rg -rn "RestAPIOperationUpstream\|RestAPIOperationUpstreamTarget\|ref-only" -A 8 -B 2 | head -120
echo "---CLUSTER BUILD IN TRANSLATOR---"
rg -rn "createCluster\|ConnectTimeout\|ClusterKey\|EnvoyClusterName\|UpstreamDefinition" --type go -A 4 -B 2 | grep -v "_test.go" | head -200Length of output: 1329 🏁 Script executed: #!/bin/bash
echo "---TRANSLATOR---"
cat gateway/gateway-controller/pkg/xds/translator.go
echo "---OPENAPI YAML OPERATION UPSTREAM---"
grep -n "RestAPIOperation\|Operation\|upstream\|ref\|basePath\|timeout\|weighted" gateway/gateway-controller/api/management-openapi.yaml | head -120
echo "---UPSTREAM DEFINITION STRUCT IN GENERATED---"
grep -n "UpstreamDefinition\|BasePath\|Weighted\|Timeout\|Connect\|UpstreamTimeout" gateway/gateway-controller/pkg/api/management/generated.go | head -80
echo "---RESTAPI TRANSFORMER KEY SECTIONS---"
grep -n "perOp\|per_op\|PerOp\|DefinitionCluster\|definitionCluster\|UpstreamDef\|upstreamDef\|clusterkey\|upstreamref" gateway/gateway-controller/pkg/transform/restapi.go | head -80Length of output: 123989
1. Envoy/EDS cluster stabilityThe cluster name for a per-op override is computed as: // translator.go – resolvePerOpDefinitionCluster
clusterName := constants.UpstreamDefinitionClusterPrefix + kind + "_" + apiID + "_" + sanitizeUpstreamDefinitionName(definition.Name)The name is derived from the definition name, never from the URL. The upstream-definitions loop creates this cluster unconditionally, so any number of operations sharing the same 2. Feature parity
If a raw URL were permitted at the operation level, those users would silently get a stripped-down backend configuration. The ref-only constraint ensures all per-op overrides have full parity with what the definition system provides. 3. Validation / single source of truthBoth the xDS translator ( // upstreamref.go
func FindByName(ref string, defs *[]api.UpstreamDefinition) (*api.UpstreamDefinition, error) { ... }URL validation (scheme, host, parse errors) is performed once per definition entry, not per operation. If inline URLs were allowed at the operation level, URL parsing and scheme validation would need to be duplicated in 4. Downsides and edge casesThe one genuine downside is boilerplate for trivial cases: if a user wants to route a single operation to a different backend, they must declare an Beyond that, the main structural consideration is the asymmetry between API-level ( Summary: The ref-only restriction is sound. It preserves EDS-stable cluster identity, delivers full feature parity (basePath, timeout, weighted load balancing), and eliminates a duplicate validation and cluster-naming code path. The only trade-off — mandatory definition declaration even for single-use refs — is reasonable given the benefits. If you find that single-use boilerplate becomes a friction point in practice, it could be addressed later by adding a convenience |
API-level main and sandbox clusters are now named "<env>_<fragment>", where fragment is the first 12 bytes of sha256(apiID) in hex, via clusterkey.APILevelName. Main and sandbox share the fragment and are set apart by the env prefix. The name excludes the URL, so a URL edit updates the same named cluster instead of renaming it, keeping routes and cluster stats keys continuous. Adds the API-level url-stable IT scenarios with their Envoy admin step definitions, an LLM cluster-naming unit test, and updates the controller data-model spec to match.
|
@coderabbitai review the per-operation upstream schema ( |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: #!/bin/bash
echo "===OPENAPI YAML - FULL OPERATION UPSTREAM SCHEMAS==="
grep -n "RestAPIOperationUpstream\|RestAPIOperationUpstreamTarget\|Operation:\|operations:" gateway/gateway-controller/api/management-openapi.yaml | head -60
echo ""
echo "===FULL SCHEMA BLOCKS==="
python3 - <<'EOF'
import re
with open("gateway/gateway-controller/api/management-openapi.yaml") as f:
content = f.read()
# Extract RestAPIOperationUpstream and RestAPIOperationUpstreamTarget blocks
for name in ["RestAPIOperationUpstream:", "RestAPIOperationUpstreamTarget:", "Operation:"]:
idx = content.find(f" {name}")
if idx == -1:
idx = content.find(name)
if idx != -1:
print(f"\n===SCHEMA: {name}===")
print(content[idx:idx+600])
print("...")
EOF
echo ""
echo "===GENERATED.GO - RestAPIOperationUpstream types==="
grep -n "RestAPIOperationUpstream\|RestAPIOperationUpstreamTarget\|union json\|MarshalJSON\|UnmarshalJSON\|AsRestAPI\|FromRestAPI\|MergeRestAPI" gateway/gateway-controller/pkg/api/management/generated.go | head -80
echo ""
echo "===GENERATED.GO - struct bodies==="
python3 - <<'EOF'
with open("gateway/gateway-controller/pkg/api/management/generated.go") as f:
content = f.read()
for name in ["RestAPIOperationUpstream", "RestAPIOperationUpstreamTarget", "RestAPIOperationUpstream0", "RestAPIOperationUpstream1"]:
idx = content.find(f"type {name} struct")
if idx == -1:
idx = content.find(f"type {name} =")
if idx != -1:
print(f"\n===TYPE: {name}===")
print(content[idx:idx+300])
EOF
echo ""
echo "===VALIDATOR - operation upstream validation==="
grep -n "op\.Upstream\|Upstream\.Main\|Upstream\.Sandbox\|upstreamRef\|OperationUpstream\|RestAPIOperationUpstream" gateway/gateway-controller/pkg/config/api_validator.go -A 5 -B 2 | head -160
echo ""
echo "===TRANSFORMER - per-op upstream consumption==="
grep -n "op\.Upstream\|Upstream\.Main\|Upstream\.Sandbox\|PerOp\|perOp\|perOpDefinitionClusterKey" gateway/gateway-controller/pkg/transform/restapi.go -A 5 -B 2 | head -160
echo ""
echo "===XDS TRANSLATOR - per-op upstream consumption==="
grep -n "op\.Upstream\|Upstream\.Main\|Upstream\.Sandbox\|resolvePerOpDefinitionCluster\|PerOp\|perOp" gateway/gateway-controller/pkg/xds/translator.go -A 5 -B 2 | head -160Length of output: 23172
Overall verdict: mostly well-formed and consistent, with one structural concern worth addressing.What's correct and consistent
Concern:
|
| allOf: | ||
| - $ref: "#/components/schemas/RestAPIOperationUpstreamTarget" |
Apply the upstream-definition name contract (minLength 1, maxLength 100, pattern ^[a-zA-Z0-9\-_]+$) to the API-level Upstream.ref so it validates the same as UpstreamDefinition.name and the per-operation ref. Reference RestAPIOperationUpstreamTarget directly from the per-operation main and sandbox instead of wrapping it in allOf, and regenerate the management model.
Define the upstream-definition name contract once in a new UpstreamReference schema (string, minLength 1, maxLength 100, pattern ^[a-zA-Z0-9\-_]+$) and reference it from UpstreamDefinition.name, Upstream.ref, and the per-operation RestAPIOperationUpstreamTarget.ref instead of repeating the constraints inline. Regenerate the management model; oapi-codegen emits UpstreamReference as a string alias, so consuming code is unchanged.
|
@coderabbitai Investigate the policy-building flow in the wso2/api-platform gateway-controller (Go) on the Context , policy building appears to exist in two places:
A grep across the whole gateway directory found NO callers of Please verify independently and report, with file:line evidence:
Give your own independent verdict. |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: #!/bin/bash
echo "===1. CALLERS OF DerivePolicyFromAPIConfig==="
rg -rn "DerivePolicyFromAPIConfig" --type go
echo ""
echo "===2. POLICY_STORE WRITES==="
rg -rn "policy_store\|PolicyStore\|StoredPolicyConfig\|SetPolicy\|SavePolicy\|WritePolicy\|StorePolicy" --type go | grep -v "_test.go" | head -80
echo ""
echo "===3. FULL BUILDER.GO==="
cat gateway/gateway-controller/pkg/policy/builder.go
echo ""
echo "===4. FULL POLICY_STORE.GO==="
fd -i "policy_store.go" --exec cat {}
echo ""
echo "===5. DEPLOY EVENT → TRANSFORM CHAIN==="
rg -rn "RestAPITransformer\|Transform\(\|Transform\b\|\.Transform" --type go | grep -v "_test.go" | head -80
echo ""
echo "===6. POLICYXDS CONSUMPTION==="
fd -i "policyxds" --type d
rg -rn "PolicyChains\|policyxds\|PushPolicy\|BuildPolicy" --type go | grep -v "_test.go" | head -80
echo ""
echo "===7. MAIN.GO OR DEPLOYMENT SERVICE WIRING==="
fd -i "main.go" --type f | xargs grep -l "policy\|Policy\|transformer\|Transformer" 2>/dev/null
rg -rn "APIDeploymentService\|deploymentService\|DeployAPI\|HandleDeploy\|RestAPITransformer\|TransformerRegistry\|transformerRegistry" --type go | grep -v "_test.go" | head -80
echo ""
echo "===8. HOW SANDBOX GATING APPEARS IN BOTH BUILDER AND RESTAPI==="
grep -n "perOpSandboxHasContent\|apiSandboxHasContent\|hasSandbox\|SandboxHas" gateway/gateway-controller/pkg/policy/builder.go gateway/gateway-controller/pkg/transform/restapi.go 2>/dev/nullLength of output: 24393 🏁 Script executed: #!/bin/bash
echo "===1. CALLERS OF 'n' FUNCTION (non-test) FROM POLICY PACKAGE==="
rg -rn "\bpolicybuilder\.n\b\|\bpolicy\.n\b\|DerivePolicyFromAPIConfig" --type go | grep -v "_test.go"
echo ""
echo "===2. CONTROLLER MAIN.GO FULL==="
cat gateway/gateway-controller/cmd/controller/main.go
echo ""
echo "===3. POLICYXDS PACKAGE CONTENTS==="
fd --type f --extension go "." gateway/gateway-controller/pkg/policyxds/ | sort | xargs -I{} sh -c 'echo "---FILE: {}---" && cat {}'
echo ""
echo "===4. RESTAPITRANSFORMER - WHERE IS IT INSTANTIATED AND CALLED==="
rg -rn "RestAPITransformer\|NewRestAPITransformer\|transformerRegistry\|TransformerRegistry\|\.Transform(" --type go | grep -v "_test.go"
echo ""
echo "===5. POLICYSTORE.SET CALLERS (production)==="
rg -rn "\.Set(" --type go | grep -i "policy\|store" | grep -v "_test.go"
echo ""
echo "===6. POLICYXDS MAIN ENTRYPOINT AND HOW IT GETS RDC==="
fd --type f --extension go "." gateway/gateway-controller/pkg/policyxds/ | head -5
echo ""
echo "===7. HOW STOREDPOLICYCONFIG IS PRODUCED AND CONSUMED==="
rg -rn "StoredPolicyConfig\|StoredPolicyConf" --type go | grep -v "_test.go" | head -40
echo ""
echo "===8. WHAT PRODUCES rdc.PolicyChains AND WHO CONSUMES==="
rg -rn "PolicyChains\|\.PolicyChains" --type go | grep -v "_test.go" | head -40
echo ""
echo "===9. APISERVER HANDLER PRODUCTION CODE==="
fd --type f --extension go "." gateway/gateway-controller/pkg/api/handlers/ | grep -v test | sort | head -20
rg -rn "DeployAPI\|HandleDeploy\|apiDeployment\|APIDeployment\|DeploymentService\|deploymentService\|policybuilder\|DerivePolicyFrom" --type go | grep -v "_test.go" | head -50Length of output: 155400
1. Is
|
| Caller | File |
|---|---|
policybuilder.n(...) |
pkg/api/handlers/policy_ordering_test.go |
policybuilder.n(...) |
pkg/api/handlers/handlers_test.go |
n(...) |
pkg/policy/builder_test.go |
policybuilder.n(...) |
cmd/controller/main_test.go |
Zero matches in any non-test file.
PolicyStore (pkg/storage/policy_store.go) also confirms this: it has a Set method, but no production code calls it. The doc comment claiming the function is used by "APIDeploymentService", "APIServer handlers", and "main.go startup" is factually incorrect and will mislead future maintainers.
2. What is the actual active policy-building path?
The active path, traced from cmd/controller/main.go, is:
main.go
→ transform.NewRestAPITransformer(...) // line ~307
→ transform.NewRegistry(restTransformer, ...) // line ~310
→ policyManager.SetTransformers(transformerRegistry) // line ~312
→ loadRuntimeConfigsFromExistingAPIConfigurations(...) // startup hydration
→ transformerRegistry.Transform(cfg) // per each StoredConfig
→ [on deploy] policyManager.UpsertAPIConfig(cfg) // pkg/policyxds/manager.go
→ pm.transformers.Transform(cfg) // calls RestAPITransformer.Transform
→ buildPolicyChain(...) // pkg/transform/restapi.go
→ rdc.PolicyChains populated
→ runtimeStore.Set(key, rdc)
→ policySnapshotManager.UpdateSnapshot(ctx)
→ translator.TranslateRuntimeConfigs(rdcs)
→ policyCache.SetResources(...) // pushed to policy engine via xDS
RestAPITransformer.Transform in pkg/transform/restapi.go is the sole production builder for rdc.PolicyChains. builder.go / n is never on this path.
3. Is the builder.go sandbox-gating change on the active path?
No. The perOpSandboxHasContent predicate added in builder.go (Line 185) modifies the function n, which has zero production callers. That change is on dead code.
The sandbox gating that is actually exercised at runtime lives in pkg/transform/restapi.go:
// restapi.go – active path
apiSandboxHasContent := ... // Line 145
hasSandbox := apiSandboxHasContent
// per-op override loop sets hasSandbox = true when op.Upstream.Sandbox != nil // Line 151-156
if apiSandboxHasContent || (op.Upstream != nil && op.Upstream.Sandbox != nil) { // Line 229
// append sandbox vhost route
}This logic is called through policyManager.UpsertAPIConfig on every deploy. The xds/translator.go path for per-op sandbox is also on the active Envoy xDS path and is correctly gated.
4. Bottom line: is the per-op work correctly wired?
Yes for the active path; no for builder.go.
| Location | Active in production? | Per-op wiring correct? |
|---|---|---|
pkg/transform/restapi.go (→ rdc.PolicyChains) |
✅ Yes | ✅ Yes — op.Upstream.Main/.Sandbox handled |
pkg/xds/translator.go |
✅ Yes | ✅ Yes — resolvePerOpDefinitionCluster wired |
pkg/policy/builder.go (n) |
❌ No — dead code | N/A |
There is no functional gap at runtime. The per-op sandbox override routes are built correctly through restapi.go and pushed to the policy engine via policyxds.
The actionable issue is builder.go itself. It carries a misleading doc comment claiming production usage, is tested as if it were a live component, and accumulates changes (including the perOpSandboxHasContent addition in this PR) that have no runtime effect. This creates ongoing maintenance risk: future per-op features added only to builder.go will silently not deploy, and the test coverage against it creates false confidence. The function should either be removed or the doc comment corrected to mark it explicitly as test-only.
Would you like to open a follow-up issue to track removal or correction of pkg/policy/builder.go (n) and pkg/storage/policy_store.go?
pkg/policy/builder.go (DerivePolicyFromAPIConfig) has no production callers, so the per-op sandbox vhost gating added there had no runtime effect. The active per-op policy path runs through transform/restapi.go and pkg/policyxds. Revert builder.go and its test to base to keep the change on the active code.
The per-op upstream comments in the RestAPI transform and xDS translator ran 5 to 7 lines each, well above the surrounding one to two line style. Condense them to the essential invariants and move the longer rationale to the PR description. No behavior change.
The per-op change renamed a TranslateRuntimeConfigs fixture cluster key that no assertion depends on. Restore the original key so the per-op diff leaves this test untouched.
- Reject compound and signed connect timeouts with a clear, discoverable error, and drop the leading-sign allowance so the validator matches the published OpenAPI pattern. - Share per-op definition cluster naming through clusterkey.DefinitionName so both xDS paths derive the name identically; remove the two duplicate sanitizers. - Drop the unused UpstreamCluster.ConnectTimeout field and its dormant reader; the data plane resolves timeouts via the legacy translator. - Trim the upstream ref lookup symmetrically and correct the EnvoyClusterName doc comment.
|
@coderabbitai review I've applied a round of fixes from an external review on this per-operation upstream PR. Please give your own independent assessment of whether the current state is correct. Verify each of these against the actual code and flag anything wrong, inconsistent, or newly broken, with
Don't assume the changes are right; give a skeptical, independent read. |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: #!/bin/bash
echo "===1. CONNECT-TIMEOUT VALIDATION==="
echo "---validator.go timeout regex and error messages---"
rg -n "timeout\|connect\|regex\|Regexp\|ParseDuration\|ParseConnect\|^\\\d\|ms|s|m|h\|unit\|unsigned\|single" gateway/gateway-controller/pkg/config/api_validator.go | head -60
echo ""
echo "---upstreamref.go ParseConnectTimeout---"
fd -i "upstreamref.go" --type f | grep -v test | xargs cat
echo ""
echo "---OpenAPI yaml pattern for timeout---"
rg -n "connect\|timeout\|pattern\|format\|Timeout" gateway/gateway-controller/api/management-openapi.yaml | head -40
echo ""
echo "===2. clusterkey.DefinitionName==="
echo "---clusterkey package full contents---"
fd -i "clusterkey.go" --type f | grep -v test | xargs cat
echo ""
echo "---old sanitizer presence check---"
rg -rn "sanitizeEnvoyClusterName\|sanitizeUpstreamDefinitionName\|sanitizeClusterName" --type go | grep -v "_test.go"
echo ""
echo "---DefinitionName callers across codebase---"
rg -rn "DefinitionName\|clusterkey\.DefinitionName" --type go | grep -v "_test.go"
echo ""
echo "---old cluster naming in translator.go---"
rg -n "UpstreamDefinitionClusterPrefix\|upstream_\|kind.*apiID\|apiID.*kind\|definition.*Name\|Name.*definition\|sanitize" gateway/gateway-controller/pkg/xds/translator.go | head -30
echo ""
echo "---old cluster naming in restapi.go---"
rg -n "UpstreamDefinitionClusterPrefix\|upstream_\|perOpDefinitionClusterKey\|DefinitionName\|sanitize" gateway/gateway-controller/pkg/transform/restapi.go | head -30
echo ""
echo "===3. UpstreamCluster.ConnectTimeout removed==="
echo "---models runtime_deploy_config.go---"
fd -i "runtime_deploy_config.go" --type f | grep -v test | xargs cat
echo ""
echo "---any remaining ConnectTimeout references in non-test go files---"
rg -rn "ConnectTimeout\|connect_timeout" --type go | grep -v "_test.go"
echo ""
echo "===4. Shared UpstreamReference schema==="
echo "---OpenAPI yaml UpstreamReference schema---"
python3 - <<'EOF'
import re
try:
with open("gateway/gateway-controller/api/management-openapi.yaml") as f:
content = f.read()
# Find UpstreamReference schema
idx = content.find("UpstreamReference:")
if idx != -1:
print(content[idx:idx+400])
else:
print("UpstreamReference not found in yaml")
# Find all usages of UpstreamReference
for m in re.finditer(r'UpstreamReference', content):
start = max(0, m.start()-100)
end = min(len(content), m.end()+200)
print(f"\n---USAGE at char {m.start()}---")
print(content[start:end])
except FileNotFoundError:
print("File not found")
EOF
echo ""
echo "---rg for UpstreamReference in yaml---"
rg -n "UpstreamReference" gateway/gateway-controller/api/management-openapi.yaml -A 5 -B 5
echo ""
echo "---validator usage of upstreamRefRegex and name validation---"
rg -n "upstreamRefRegex\|UpstreamReference\|nameRegex\|NameRegex\|RefPattern\|refPattern" --type go | grep -v "_test.go" | head -40Length of output: 19523 🏁 Script executed: #!/bin/bash
echo "===FULL VALIDATOR FILE - timeout and per-op sections==="
grep -n "timeout\|Timeout\|connect\|Connect\|ParseDuration\|regex\|Regex\|pattern\|Pattern\|upstreamRef\|UpstreamRef\|upstreamDefinitions\|UpstreamDefinition\|ref\|Ref" gateway/gateway-controller/pkg/config/api_validator.go | head -100
echo ""
echo "===FULL CLUSTERKEY FILE==="
cat gateway/gateway-controller/pkg/utils/clusterkey/clusterkey.go
echo ""
echo "===FULL RESTAPI.GO - cluster key usage==="
grep -n "clusterkey\|DefinitionName\|UpstreamDefinitionClusterPrefix\|constants\." gateway/gateway-controller/pkg/transform/restapi.go | head -40
echo ""
echo "===FULL TRANSLATOR.GO - cluster key usage==="
grep -n "clusterkey\|DefinitionName\|UpstreamDefinitionClusterPrefix\|constants\." gateway/gateway-controller/pkg/xds/translator.go | head -40
echo ""
echo "===CONSTANTS FILE==="
fd -i "constants.go" --type f | grep -i "gateway-controller" | xargs cat 2>/dev/null | head -60
echo ""
echo "===FULL RUNTIME DEPLOY CONFIG==="
cat gateway/gateway-controller/pkg/models/runtime_deploy_config.go
echo ""
echo "===POLICYXDS - ConnectTimeout usage==="
rg -rn "ConnectTimeout\|connect_timeout\|connectTimeout" --type go | grep -v "_test.go"
echo ""
echo "===OPENAPI YAML - full upstream and operation sections (500 lines around key areas)==="
python3 - <<'EOF'
try:
with open("gateway/gateway-controller/api/management-openapi.yaml") as f:
lines = f.readlines()
for i, line in enumerate(lines):
if any(kw in line for kw in ["UpstreamReference", "UpstreamDefinition", "RestAPIOperationUpstream", "Upstream:", "timeout", "connect", "pattern:", "UpstreamTimeout"]):
start = max(0, i-2)
end = min(len(lines), i+10)
print(f"---Line {i+1}---")
print("".join(lines[start:end]))
except FileNotFoundError:
print("File not found")
EOFLength of output: 48950
1. Connect-timeout validationVerdict: Correct and consistent with the OpenAPI pattern.
connectTimeoutRegex: regexp.MustCompile(`^\d+(\.\d+)?(ms|s|m|h)$`),
pattern: '^\d+(\.\d+)?(ms|s|m|h)$'These are identical. The validation order (Lines 380–397) is correct:
One observation to be aware of: 2.
|
| Schema | Field | Line |
|---|---|---|
UpstreamDefinition |
name |
4226 |
Upstream |
ref |
4277 |
RestAPIOperationUpstreamTarget |
ref |
4343 |
The validator's upstreamRefRegex (^[a-zA-Z0-9\-_]+$) is an exact match with the OpenAPI pattern. The max-100/non-empty checks in the validator align with maxLength: 100 / minLength: 1 in the schema.
Using UpstreamReference for both definition name and for ref values creates a useful structural invariant: any definition name that passes validation is guaranteed to be directly referenceable from a per-op upstream target without a separate character-set conversion. This is sound.
Previously noted concern still open
The anyOf union boilerplate in generated.go (RestAPIOperationUpstream0/1, MarshalJSON delegating to union json.RawMessage) was raised in the prior review. That was not part of this fix set — the anyOf is still in the schema (yaml Line 4327) and the union code is still generated. The MarshalJSON-null risk on the GET response path remains.
Cover the upstreamDefinition name contract (max length and allowed pattern) in the validator, and add xDS-path tests for a per-op main override reusing the referenced definition cluster and for the same-vhost rejection guard.
Summary
Adds a per-operation
upstreamoverride on REST API operations: an operation can route itsmainand/orsandboxtraffic to a different backend than the API-level upstream. Per-op targets are ref-only: each references a named entry inspec.upstreamDefinitionsrather than carrying an inline URL. Operations without a per-op upstream fall back to the API-level upstream, exactly as before.A per-op route reuses the referenced
upstreamDefinition's cluster (one cluster per definition, so operations sharing arefshare it) rather than minting a cluster per operation. API-levelmain/sandboxcluster names are URL-stable (derived from a hash of the API's identity, not the URL). Both are URL-independent, so URL edits update Envoy endpoints in place instead of recreating clusters.Why
Real-world APIs often need individual operations to reach different backend services. Today that needs path-based workarounds or splitting into multiple APIs. This lets the API definition declare per-operation routing directly, while backend URLs stay defined once in
upstreamDefinitionsand are referenced by name.Design decisions
Ref-only at the operation level.
operations[].upstream.main/.sandboxcarry only arefto a namedupstreamDefinition, with no inlineurland no per-target timeout/hostRewrite. Backends are declared once inspec.upstreamDefinitions; connect timeout lives on the definition. (API-levelupstreamstill accepts eitherurlorref, unchanged.)Per-op routes reuse the definition cluster. A per-op
refdoes not create a new cluster. It points the operation's route at the cluster the referencedupstreamDefinitionalready owns (upstream_<kind>_<apiID>_<defName>, built once per definition). N operations sharing arefcost one cluster, not N, and the route inherits the definition's authoritative base path.cluster_headerstays on with that cluster as the default, so a dynamic-endpoint policy can still override the operation (precedence: op-policy > api-policy > per-op ref > api-level upstream).URL-stable API-level cluster naming. API-level
main/sandboxcluster names derive from the first 12 bytes ofsha256(apiID)(24 hex), viaclusterkey.APILevelName(env, apiID), producingmain_<fragment>/sandbox_<fragment>. Theenvis a plaintext prefix, not part of the hash, somainandsandboxshare the same fragment. The URL is intentionally excluded, so a URL edit keeps the same cluster name and Envoy updates its endpoints in place (no connection drain) rather than recreating the cluster. The clusters areSTRICT_DNSwith an inline load assignment. Rollout: a one-time cluster rebuild on the first config push after upgrade (URL-derived to identity-based names), handled gracefully by Envoy's CDS warming; stable thereafter.Single source of truth. Ref resolution and connect-timeout parsing go through
pkg/utils/upstreamref(FindByName,ParseConnectTimeout); API-level cluster-name hashing goes throughpkg/utils/clusterkey(APILevelreturns the identity fragment,APILevelNamejoins the env prefix). Both are consumed by the validator, the RDC transformer, and the xDS translator so they cannot drift.What changed (gateway-controller layer)
management-openapi.yaml): optionalupstreamonOperationresolving toRestAPIOperationUpstream(main/sandbox), each a ref-onlyRestAPIOperationUpstreamTarget. Wrapper and leaf both locked withadditionalProperties: false.api_validator.go): rejects an emptyupstream: {}wrapper; requires at least one ofmain/sandbox; rejects arefthat does not resolve; and enforces one name contract (^[a-zA-Z0-9\-_]+$, max 100 chars) on both per-op refs andupstreamDefinitionnames so every valid definition name is referenceable.transform/restapi.go) and xDS translator (xds/translator.go): per-op routes reuse the referenced definition's cluster (upstream_<kind>_<apiID>_<defName>, built once per definition); API-level clusters use URL-stable namesmain_<fragment>/sandbox_<fragment>; operations without a per-op upstream fall back to the API-level cluster.pkg/utils/upstreamref,pkg/utils/clusterkey): new leaf packages.upstreamrefhandles ref resolution and connect-timeout parsing; the stdlib-onlyclusterkeyhandles URL-stable API-level cluster-key hashing (APILevel/APILevelName).Test coverage
pkg/config,pkg/transform,pkg/xds,pkg/policy,pkg/utils/upstreamref, andpkg/utils/clusterkey: deterministic hash tests (APILevel), validator rejection tests, URL-stable cluster-name contract tests (same name across a URL edit), and definition-cluster reuse tests (operations sharing arefshare one cluster).per-op-upstream-basic(9),per-op-upstream-ref(6),per-op-upstream-validation(6),api-level-url-stable(6)) covering API-level fallback, per-op main/sandbox ref overrides, validation rejections, dynamic-endpoint precedence over per-op refs, and URL-stable in-place endpoint updates. All passing against a local docker-compose stack.Example